Name | Version | Summary | date |
vllm |
0.6.4.post1 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-11-15 18:43:33 |
vllm-rocm |
0.6.3 |
A high-throughput and memory-efficient inference and serving engine for LLMs with AMD GPU support |
2024-10-15 17:17:24 |
vllm-flash-attn |
2.6.2 |
Forward-only flash-attn |
2024-09-05 20:36:33 |
vllm-xft |
0.5.5.0 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-08-29 08:38:30 |
vllm-acc |
0.4.21716571491.2888474 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-05-24 17:35:42 |
vllm-online |
0.4.2 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-04-29 02:49:29 |
tilearn-infer |
0.3.3 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2024-04-22 03:24:24 |
hive-vllm |
0.0.1 |
a |
2024-02-28 19:44:57 |
vllm-consul |
0.2.1 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2023-10-26 07:04:42 |
vllm-py |
0.0.1 |
A high-throughput and memory-efficient inference and serving engine for LLMs |
2023-06-19 03:47:08 |
woosuk-vllm-test |
0.1.1 |
vLLM: Easy, Fast, and Cheap LLM Serving for Everyone |
2023-06-18 20:09:46 |